Fast Deterministic Single-Linkage 2D-Spatial Cluster Analysis

نویسنده

  • Daniel Goldbach
چکیده

Cluster analysis is a common task in data mining, machine learning and related fields. There exist a plethora of clustering algorithms designed for this purpose, but many are prohibitively inefficient (e.g. quality-threshold clustering), non-deterministic (k-means) or utilise inherently lossy partitioning models (k-d tree clustering). Single-linkage hierarchical clustering is a form of cluster analysis which unites clusters based on the minimum distance between them, using a given distance metric. Though more complex clustering methods exist, the intuitive nature and ease of implementation of single-linkage hierarchical clustering makes it a reasonably common choice for cluster analysis. However, the general case of single-linkage clustering is O(n3) (though the SLINK algorithm runs in O(n2) time for some special cases [1]). A specific case – and likely the most intuitive case – of cluster analysis is that which is performed on a two-dimensional Euclidean plane. This has many real-world applications, including image analysis/segmentation and medical imaging. This paper presents a quasi-linear time algorithm for single-linkage hierarchical clustering of points in two-dimensional Euclidean space. This concept is not itself novel (see [2, 3]); however, the use of an agglomerative approach as opposed to a fixedthreshold edge filtering provides a concise and effective way to extract a specific number of clusters. The algorithm also guarantees that the maximum distance between any pair of points in a cluster is minimised. BODY Find the Delaunay triangulation of the points with Sweephull, then Kruskal’s MST until the required cluster count is reached. O(n logn).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deterministic and probabilistic approaches for tracking virus particles in time-lapse fluorescence microscopy image sequences

Modern developments in time-lapse fluorescence microscopy enable the observation of a variety of processes exhibited by viruses. The dynamic nature of these processes requires the tracking of viruses over time to explore spatial-temporal relationships. In this work, we developed deterministic and probabilistic approaches for multiple virus tracking in multi-channel fluorescence microscopy image...

متن کامل

Who Should be Interviewed? A Response from Cluster Analysis

Objective: This article presents an application of cluster analysis for social sciences researches especially those studies that have an interview as part of their data collection. This application is more suitable for sequential mixed method researchers who use quantitative data to frame subsequent qualitative subsamples for conducting interviews.  Methods: In more detail, the algorithm (i....

متن کامل

LiDAR Based Real Time Multiple Vehicle Detection and Tracking

Self-driving vehicle require a high level of situational awareness in order to maneuver safely when driving in real world condition. This paper presents a LiDAR based real time perception system that is able to process sensor raw data for multiple target detection and tracking in dynamic environment. The proposed algorithm is nonparametric and deterministic that is no assumptions and priori kno...

متن کامل

Spatial distribution of nuclei in progressive nucleation: modeling and application

Phase transformations ruled by non-simultaneous nucleation and growth do not lead to random distribution of nuclei. Since nucleation is only allowed in the untransformed portion of space, positions of nuclei are correlated. In this article an analytical approach is presented for computing pair-correlation function of nuclei in progressive nucleation. This quantity is further employed for charac...

متن کامل

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

The goal of clustering is to detect the presence of distinct groups in a data set and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the do...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TinyToCS

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2012